AITopics | semantic scholar

Collaborating Authors

semantic scholar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How to Find Fantastic AI Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review

Su, Buxin, Collina, Natalie, Wen, Garrett, Li, Didong, Cho, Kyunghyun, Fan, Jianqing, Zhao, Bingxin, Su, Weijie

arXiv.org Artificial IntelligenceNov-26-2025

Peer review in academic research aims not only to ensure factual correctness but also to identify work of high scientific potential that can shape future research directions. This task is especially critical in fast-moving fields such as artificial intelligence (AI), yet it has become increasingly difficult given the rapid growth of submissions. In this paper, we investigate an underexplored measure for identifying high-impact research: authors' own rankings of their multiple submissions to the same AI conference. Grounded in game-theoretic reasoning, we hypothesize that self-rankings are informative because authors possess unique understanding of their work's conceptual depth and long-term promise. To test this hypothesis, we conducted a large-scale experiment at a leading AI conference, where 1,342 researchers self-ranked their 2,592 submissions by perceived quality. Tracking outcomes over more than a year, we found that papers ranked highest by their authors received twice as many citations as their lowest-ranked counterparts; self-rankings were especially effective at identifying highly cited papers (those with over 150 citations). Moreover, we showed that self-rankings outperformed peer review scores in predicting future citation counts. Our results remained robust after accounting for confounders such as preprint posting time and self-citations. Together, these findings demonstrate that authors' self-rankings provide a reliable and valuable complement to peer review for identifying and elevating high-impact research in AI.

artificial intelligence, citation count, submission, (18 more...)

arXiv.org Artificial Intelligence

2510.02143

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Shi, Xiaofeng, Li, Yuduo, Kou, Qian, Yu, Longbin, Xie, Jinxin, Zhou, Hua

arXiv.org Artificial IntelligenceJul-22-2025

Recent advances in large language models (LLMs) have opened new opportunities for academic literature retrieval. However, existing systems often rely on rigid pipelines and exhibit limited reasoning capabilities. We introduce SPAR, a multi-agent framework that incorporates RefChain-based query decomposition and query evolution to enable more flexible and effective search. To facilitate systematic evaluation, we also construct SPARBench, a challenging benchmark with expert-annotated relevance labels. Experimental results demonstrate that SPAR substantially outperforms strong baselines, achieving up to +56% F1 on AutoScholar and +23% F1 on SPARBench over the best-performing baseline. Together, SPAR and SPARBench provide a scalable, interpretable, and high-performing foundation for advancing research in scholarly retrieval. Code and data will be available at: https://github.com/xiaofengShi/SPAR

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.15245

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ACL-rlg: A Dataset for Reading List Generation

Aubert-Béduchaud, Julien, Boudin, Florian, Daille, Béatrice, Dufour, Richard

arXiv.org Artificial IntelligenceDec-30-2024

Familiarizing oneself with a new scientific field and its existing literature can be daunting due to the large amount of available articles. Curated lists of academic references, or reading lists, compiled by experts, offer a structured way to gain a comprehensive overview of a domain or a specific scientific challenge. In this work, we introduce ACL-rlg, the largest open expert-annotated reading list dataset. We also provide multiple baselines for evaluating reading list generation and formally define it as a retrieval task. Our qualitative study highlights the fact that traditional scholarly search engines and indexing methods perform poorly on this task, and GPT-4o, despite showing better results, exhibits signs of potential data contamination.

query, reading list, reading list generation, (14 more...)

arXiv.org Artificial Intelligence

2502.15692

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.05)
Asia > Singapore (0.04)
(5 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing

Schopf, Tim, Matthes, Florian

arXiv.org Artificial IntelligenceJul-4-2024

Scientific literature searches are often exploratory, whereby users are not yet familiar with a particular field or concept but are interested in learning more about it. However, existing systems for scientific literature search are typically tailored to keyword-based lookup searches, limiting the possibilities for exploration. We propose NLP-KG, a feature-rich system designed to support the exploration of research literature in unfamiliar natural language processing (NLP) fields. In addition to a semantic search, NLP-KG allows users to easily find survey papers that provide a quick introduction to a field of interest. Further, a Fields of Study hierarchy graph enables users to familiarize themselves with a field and its related areas. Finally, a chat interface allows users to ask questions about unfamiliar concepts or specific articles in NLP and obtain answers grounded in knowledge retrieved from scientific publications. Our system provides users with comprehensive exploration possibilities, supporting them in investigating the relationships between different fields, understanding unfamiliar concepts in NLP, and finding relevant research literature. Demo, video, and code are available at: https://github.com/NLP-Knowledge-Graph/NLP-KG-WebApp.

computational linguistic, literature, publication, (13 more...)

arXiv.org Artificial Intelligence

2406.15294

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Singapore (0.04)
(11 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias

Algaba, Andres, Mazijn, Carmen, Holst, Vincent, Tori, Floriano, Wenmackers, Sylvia, Ginis, Vincent

arXiv.org Artificial IntelligenceMay-29-2024

Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases. The emergence of Large Language Models (LLMs) like GPT-4 introduces a new dynamic to these practices. Interestingly, the characteristics and potential biases of references recommended by LLMs that entirely rely on their parametric knowledge, and not on search or retrieval-augmented generation, remain unexplored. Here, we analyze these characteristics in an experiment using a dataset of 166 papers from AAAI, NeurIPS, ICML, and ICLR, published after GPT-4's knowledge cut-off date, encompassing 3,066 references in total. In our experiment, GPT-4 was tasked with suggesting scholarly references for the anonymized in-text citations within these papers. Our findings reveal a remarkable similarity between human and LLM citation patterns, but with a more pronounced high citation bias in GPT-4, which persists even after controlling for publication year, title length, number of authors, and venue. Additionally, we observe a large consistency between the characteristics of GPT-4's existing and non-existent generated references, indicating the model's internalization of citation patterns. By analyzing citation graphs, we show that the references recommended by GPT-4 are embedded in the relevant citation context, suggesting an even deeper conceptual internalization of the citation networks. While LLMs can aid in citation generation, they may also amplify existing biases and introduce new ones, potentially skewing scientific knowledge dissemination. Our results underscore the need for identifying the model's biases and for developing balanced methods to interact with LLMs in general.

citation count, gpt-4, ground truth reference, (12 more...)

arXiv.org Artificial Intelligence

2405.15739

Country:

North America > United States > Virginia (0.04)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TL;DR Progress: Multi-faceted Literature Exploration in Text Summarization

Syed, Shahbaz, Al-Khatib, Khalid, Potthast, Martin

arXiv.org Artificial IntelligenceFeb-10-2024

This paper presents TL;DR Progress, a new tool for exploring the literature on neural text summarization. It organizes 514~papers based on a comprehensive annotation scheme for text summarization approaches and enables fine-grained, faceted search. Each paper was manually annotated to capture aspects such as evaluation metrics, quality dimensions, learning paradigms, challenges addressed, datasets, and document domains. In addition, a succinct indicative summary is provided for each paper, consisting of automatically extracted contextual factors, issues, and proposed solutions. The tool is available online at https://www.tldr-progress.de, a demo video at https://youtu.be/uCVRGFvXUj8

computational linguistic, information, summarization, (16 more...)

arXiv.org Artificial Intelligence

2402.06913

Country:

North America > Canada > Ontario > Toronto (0.05)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(8 more...)

Genre:

Research Report (1.00)
Overview (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

Add feedback

AI and Generative AI for Research Discovery and Summarization

Glickman, Mark, Zhang, Yi

arXiv.org Artificial IntelligenceJan-8-2024

AI and generative AI tools, including chatbots like ChatGPT that rely on large language models (LLMs), have burst onto the scene this year, creating incredible opportunities to increase work productivity and improve our lives. Statisticians and data scientists have begun experiencing the benefits from the availability of these tools in numerous ways, such as the generation of programming code from text prompts to analyze data or fit statistical models. One area that these tools can make a substantial impact is in research discovery and summarization. Standalone tools and plugins to chatbots are being developed that allow researchers to more quickly find relevant literature than pre-2023 search tools. Furthermore, generative AI tools have improved to the point where they can summarize and extract the key points from research articles in succinct language. Finally, chatbots based on highly parameterized LLMs can be used to simulate abductive reasoning, which provides researchers the ability to make connections among related technical topics, which can also be used for research discovery. We review the developments in AI and generative AI for research discovery and summarization, and propose directions where these types of tools are likely to head in the future that may be of interest to statistician and data scientists.

chatgpt, distance matrix, reasoning, (16 more...)

arXiv.org Artificial Intelligence

2401.06795

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Science and Technology Ontology: A Taxonomy of Emerging Topics

Kumar, Mahender, Rani, Ruby, Botarelli, Mirko, Epiophaniou, Gregory, Maple, Carsten

arXiv.org Artificial IntelligenceMay-6-2023

Ontologies play a critical role in Semantic Web technologies by providing a structured and standardized way to represent knowledge and enabling machines to understand the meaning of data. Several taxonomies and ontologies have been generated, but individuals target one domain, and only some of those have been found expensive in time and manual effort. Also, they need more coverage of unconventional topics representing a more holistic and comprehensive view of the knowledge landscape and interdisciplinary collaborations. Thus, there needs to be an ontology covering Science and Technology and facilitate multidisciplinary research by connecting topics from different fields and domains that may be related or have commonalities. To address these issues, we present an automatic Science and Technology Ontology (S&TO) that covers unconventional topics in different science and technology domains. The proposed S&TO can promote the discovery of new research areas and collaborations across disciplines. The ontology is constructed by applying BERTopic to a dataset of 393,991 scientific articles collected from Semantic Scholar from October 2021 to August 2022, covering four fields of science. Currently, S&TO includes 5,153 topics and 13,155 semantic relations. S&TO model can be updated by running BERTopic on more recent datasets.

artificial intelligence, machine learning, ontology, (14 more...)

arXiv.org Artificial Intelligence

2305.04055

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Impact, Attention, Influence: Early Assessment of Autonomous Driving Datasets

Bogdoll, Daniel, Hendl, Jonas, Schreyer, Felix, Gowda, Nishanth, Färber, Michael, Zöllner, J. Marius

arXiv.org Artificial IntelligenceMar-31-2023

Autonomous Driving (AD), the area of robotics with the greatest potential impact on society, has gained a lot of momentum in the last decade. As a result of this, the number of datasets in AD has increased rapidly. Creators and users of datasets can benefit from a better understanding of developments in the field. While scientometric analysis has been conducted in other fields, it rarely revolves around datasets. Thus, the impact, attention, and influence of datasets on autonomous driving remains a rarely investigated field. In this work, we provide a scientometric analysis for over 200 datasets in AD. We perform a rigorous evaluation of relations between available metadata and citation counts based on linear regression. Subsequently, we propose an Influence Score to assess a dataset already early on without the need for a track-record of citations, which is only available with a certain delay.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICCRE57112.2023.10155607

2301.022

Country:

Asia > Thailand (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (1.00)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.92)
Information Technology > Robotics & Automation (0.82)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

M2D2: A Massively Multi-domain Language Modeling Dataset

Reid, Machel, Zhong, Victor, Gururangan, Suchin, Zettlemoyer, Luke

arXiv.org Artificial IntelligenceOct-13-2022

We present M2D2, a fine-grained, massively multi-domain corpus for studying domain adaptation in language models (LMs). M2D2 consists of 8.5B tokens and spans 145 domains extracted from Wikipedia and Semantic Scholar. Using ontologies derived from Wikipedia and ArXiv categories, we organize the domains in each data source into 22 groups. This two-level hierarchy enables the study of relationships between domains and their effects on in- and out-of-domain performance after adaptation. We also present a number of insights into the nature of effective domain adaptation in LMs, as examples of the new types of studies M2D2 enables. To improve in-domain performance, we show the benefits of adapting the LM along a domain hierarchy; adapting to smaller amounts of fine-grained domain-specific data can lead to larger in-domain performance gains than larger amounts of weakly relevant data. We further demonstrate a trade-off between in-domain specialization and out-of-domain generalization within and across ontologies, as well as a strong correlation between out-of-domain performance and lexical overlap between domains.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.0737

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(8 more...)

Genre: Research Report > New Finding (0.94)

Industry: Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.90)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback